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ABSTRACT 

Item bias is defined as the dependence o£ item 
responses and group membership conditional on the value o£ the trait 
that the test is supposed to measure. The results o£ item bias 
detection methods based on this conditional definition and using a 
stepwise or iterative procedure appear to be adequate. In this paper, 
exoerimental studies in the Netherlands on the explanation o£ item 
bias are reported. For each of the 60 items of an arithmetic test, an 
assessment was made as to whether the item was biased between Dutch 
and Turkish/Moroccan students at the end o£ their sixth-grade year. 
Hypotheses were formulated to explain the bias. According to the 
hypotheses, biased items were modified to become less biased and 
unbiased items were modified to become more biased. The original and 
modified test versions were randomly-administered to each of 169 
students of Dutch origin and 93 students of Turkish or Moroccan 
origin. The statistical tests showed that the hypothesis was 
confirmed in only three of the 38 cases, (Author/TJB) 
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Item bias is defined as the dependence of item responses and group membership 
conditional on the vaJue of the trait that the test is supposed to measure The results 
of item bias detection methods based on this conditional definition and using a 
stepvise or iterative procedure appear to be adequate, {n this paper experimental 
studies on the explanation of item bias are reported. For each of the 60 items of a^ 
arithmetic test it vas investigated Th ether the item vas biased betveen I>utch and 
Turkish /Maroc can students at the end of the sixth ^rade. Hypotheses vere 
formulated to explain the bias. According to the hypotheses biased items vere 
modified to become less biased and unbiased items vere modified to become more 
biasej. The original and modified test versions were randomly assigned to each of 
169 students of Dutch origin and 93 students of Turkish orMaroccan origin The 
statistical tests shoved that only in thraeof the 3S cases the hypothesis was 
confirmed. 



Key Vords: Item bias. Iterative Logit Method definition of item bias, experimental 
research on the explanation of item bias. 



1 



Definition of Item Bias 

Iq item bias research it is investigated vhether educational or psychological 
constructs are differently measured across groups, Item bias research usually starts 
with the observation that group membership isassociated vith item responses, e g 
the item scores are higher for Vhites than for Blacks The situation is shovn in 
Figure 1(a). 



Insert Figure 1 about here 



The rectangles indicate observed variables. The rectangle denoted Group indicates 
an observed nominal variable for group membership, such as Black and White, the 
rectangle denoted Item indicates the observed item responses such as Correct and 
Incorrect on an Arithmetic item, The double-headed arrov indicates the association 
betveen the tvo variablesi e,g. one group tends to have more correct ansvers on 
the item than the other group 

The finding that Group and Itemare associated is. hovever not sufficient 
evidence for the statement that the item is biased For example^ it might be that one 
group is truly better in arithnketic than the other groupi and that, therefore, group 
membership is associated vith the responses on an item measuring arithmetic. This 
means that a latent Trait, such as latent Artihmetic Abilityi is used for explaining 
the association between Group and Item The situation., vhere the latentTraitcan 
explain the association betveen Group and Item, isshovn in Figure KbrThe circle 
indicates a latent variable and the arrov a causal influence. Trait and Grojpare 
associated. i,e one group has lover ability Uian the other group The item of Figure 
Kb) is defined to be unbiased The latentTrait is capable to explain the association 
between Group and Item, The groups differ in latentability. but given the level of 
the latent trait Item and Group are independent In the literature on contingency 
table methodology this situation has been called condilional indepence t<tee for 

example. Fien berg. 1980, p. 2S) . conditonalon the level of the Trait the observed 
variables Item and Group are independent- In more common language is the Trait 
the third variable that is responsible for the correlation betveen the other 
variables 

From the definition of an unbiased item follovs immediately the definition 
of a biased item, siven the level of the latentTrait Item and Group are dependent 
The situation is shown in Figure He). 
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It is remarked thai th9 defmitioa does not depend od the measurement levd 
of the three variables Usually item bias is described in terms of a dichotomous 
response variable (e g. correct/incorrect), a nomiftal group membership variable 
(e g Black/White), and a latent variable at interval level. But. other types of 
measurementscales are conceivable, and they fit in tl>e general definiiion' 

For the special case of a dichotomous item response \e.g correct/incorrect), 
nominal group membership (e g. Black/Vhite). and a latent trait at the interval 
level another definition is used: An item is unbiased if its item characteristic curves 
are identical across groups; otherwise the item is biased. In the special case of a 
dichotomous response variable, nominal group membership, and an interval latent 
variable the tvo definitions are identical (Mellenbergh. 19S8) 

Ite m Bias Detectio n 

The main problem in item bias detection js the measurementof the latent 
trait Usually a trait is measured using an educational or psychological test In 
classical psychometrirsthe total test score is used as an indication of the latent trait, 
whereas in modern iisychometricstbe item responses are used for estimating the 
Idtent scores. But in both approaches the same circularity applies. If the test 
contains biased items the measurement of the latent trait is not free of the bias that 
is investigated 

Several methods for item bias detection have been developed, for a review 
see Mellenbergh (19SS). But in all methods the above mentioned circularity 
remains; A biased measure of the latent trait is used for investigating item bias 

To break through the circularity lord(19S0. sec 14.^) proposed a stepwise 
procedure. In the first st^P the total test score is used for estimating the subjects' 
latent trait values and for computing item bias statistics. In the second stepihe 
biased items are excluded from the test and the reduced test is used for estimating 
the latent trait values and for investigating item bias Van der Flier, Mellenbergh. 
Addr. and Vijn ( 19S4) developed a completely iterative procedure This so called 
Iterative Logit Method appeared to be very efTicient in detecting simulated biased 
items (Van der Flier,Mellenbergh,6cAd6r 19S4: Van der Flien Mellenbergh, Ad^r, 
6cVi]n. 1^4) and in detecting experimentally induced biased items (Kok. 
Mellenbergh. 6c Van der Flier, 19S5). It is remarked that other item bias detection 
methods can also be easily extended to iterative procedures and that they might be 
very efficient as well. 



EiolanaUon o f Item Bias 



In many applications the user is satisfied with the detection of iteins that are 
biased vith respect to certain groups The items are removed from theiestand it is 
claimed that the testis fairvith respect to the groups that have t^een investigated. 
Bui an important question remains: Why are these items biased^ The ansver to this 
question is notonly of academic interest but has also relevance for t^st 
construction. If the biasing factors are knovn the test constructor can prevent the 
occurence of biased items. 

Suppose an item is biasej; the bias is graphically represented in Figure 1(c) . 
In this figure. Item and Group are associated as indicated by the double-headed 
arrov betveen Item and Group: conditional on the value of the latent traitjtem and 
Group are dependent .vhich is the definition of item bias Further, suppose that 
next to the first trait a second trait is measured by the item. The second trait is an 
explanation of the item bias vhen the bias disappears by introductin g the second 
trait This situation is graphically displayed in Figure Kd).In Figure 1(c) the item is 
biased, but in Figure Ud) the bias has disappeared by introducing the second trait 
This analysis shovs that the search for explanation can be described as 'findingthe 
biasing tniKsV (Mellenbergh U Kok. 1988) 

Mellenbergh and Kok ( 1988) described four research strategies for 
explaining item bias (1) qualitative, (2) congelation al. (3) quasi-experimental, and 
(4) experimental In the remainder of this study experiments on the explanation of 
item bias are reported. 

Eiperimeftts 

The Studies vere inspired by a similar experiment of Scheuneman (1987) 
One experiment (Groen. 1988) is completed, vhereas the analysis of the dataof tiie 
second experiment (Molendiik. in preparation) is in progress The first experiment 
is decribed in some detail. 

The test is a60-item multiple-choice teston arithmetic, administered at the 
endof primary school in Thv Netherlands Using the Iterative Logit Method the 
items vere investigated on item bias in agroupof 25C0 Dutch students and 
studentsofMaroccanandTurkish origin at Dutch schools Twelve of the items 
appeared to be biased betveen the tvo groups. 



A secoD d version of th? test was prepared: Th^ biaaed items ver? modifwd 
aod a Dumber of unbiased items v^re also modified Som? of th^ biased items were 
modified more thao oDce aod differeDtmodiflcattoDSof the origiDal biased item 
were iocluded in the second verstoQ of the test 

Hypotheses 

The items vere modified accordiog to ooeof four hypotheses od the 
explaoatioD of item bias. 

First, it was hypothesized that the plausibility of incorrect options can cause 
bias For seven unbiased items plausible incorrect options v^re replaced by less 
plausible ones: for three biased sterns less plausible incorrect options v^re replaced 
by plausible ones. An example is given in Figure 2. 



InsertFigure 2 about here 



Seconds itvas hypothesized that lack of time or fatigue caa cause iter bias 
Four biased items thatvereat the end of the original testwere placed at the 
beginning of the modified test Four unbiased items at the beginning of the original 
test were placed at the end of the modified test. 

Third, itvas hypothesized that the knowledge of wonts or expressions can 
cause bias. In six unbiased items words or expressions were replaced by harder 
words or expressions, two biased items words or expressions were replaced by 
easier wonts or expressions. 

Fourth, it was hypothesized that the vomplexity of the item can cause item 
bias. For four unbiased items the items vere formulated more complex and for eight 
biasei items the items were formulated less complex 

Subjects 

The test was administered to eighteen schools in Amsterdam The schools are 
in neighbourhoodswith 0uny Turkish and Maroccan immigrants One of the two 
versions of the test was randomly assigned to a group of 262 students, consisting of 
169 students of Dutch origtn and 93 students of Turkish and Maroccan origin. 
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Dato analysis 



For each of the items per cell of the 2 (testversioos) x 2 (Dutch/Turkish or 
Maroccan) design the proportionof correct answers was computed An example h 
given in Table 1 The proportions vere analyzed usingthe logit model (Fienberg. 
1980) 



Insert Table 1 about here 



According to the hypothesis the biased items vere modified to become less 
biased and the unbiased items vera modified to become more biased In technical 
terms this means that in the logit model the interaction of group x test version is of 
interest. For each of the items the null hypothesis that the interaction parameter is 
zero was tested at the 3% significance level. Table 1 shows that the difference in the 
proportions between the two groups is smaller for the modified item than for the 
oiiginal item, which means that the bias has decreased But the statistical test 
shows that the effect is not significant at the 3% level. 



Results 

In total 3S items were modified. In only three of these 3S ca5es the 
interaction parameter is significant at the 5% level. 



Second exoefiment 

In a second experiment (L. Molendijk) some other hypotheses were tested For 
example, the hypothesis was tested that the use of decimals in the arithmetic items 
could cause the bias. The design of this experiment is similar to the design of the 
first experiment. The only difference is that the same subjects were repeatedly 
tested- one time the original test was administered and the other time the modified 
(est was applied The data are not yet completely analyzed but the preliminary 
analyses show the same results as the first experiment: In general the hypotheses 
are not confirmed. 
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Usually very broad traits are meotioiied as cxptaoatioii of item bias. e.g. the 
jnasteryof the item language. In these esperiments very specific hypotheses ve re 
used: they were formulated at the concrete level of each of the items Our 
preference is in the direction of rather specific hypotheses because they give more 
insight in the process that causes the bias. 

A disadvantage of a specific hypothesis is, hovever. that it may be 
misspecified. Anyvay. it appears to be very hard to find the biasing traits. 
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fig ure 1 Graphicdl display of (at associattun of group membership 

and item responses, (b) an unbiased item, Ic) a biased item, 
and (d) a biased item ufhere the bias disappears by 
introducing an additional trait. 
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Original item 

457 - 2,34 - 

R. 2,23 

B. 454.66 

C. 454,76 

D. Ttie correct ensiver 
It not giuen 



Modified item 



45V - 2,43 » 

fl. 453,69 

B. 454,66 

C. 454,76 

0. Ttie correct ftn$iver 
it not giuen 



BsstUiZ CHomple of a biated item (no. 33) where the iett piautibie 
option fl it replaced bg a more piautibie one. 
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Table 1 



Proportion of correct answers per cell of the 
2 (uerslons) x 2 (groups) design, item no. 33 



Test uerslon 6roup 



Dutch Turkish/Maroccan 
(N-169) (N«93) 



Original .60 .40 

(Biased) 

Modified .63 .53 
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